Clustering Gene Expression Data Using an Effective Dissimilarity Measure1

نویسندگان

  • R. Das
  • D. K. Bhattacharyya
  • J. K. Kalita
چکیده

This paper presents two clustering methods: the first one uses a density-based approach (DGC) and the second one uses a frequent itemset mining approach (FINN). DGC uses regulation information as well as order preserving ranking for identifying relevant clusters in gene expression data. FINN exploits the frequent itemsets and uses a nearest neighbour approach for clustering gene sets. Both the methods use a novel dissimilarity measure discussed in the paper. The clustering methods were experimented in light of reallife datasets and the methods have been established to perform satisfactorily. The methods were also compared with some wellknown clustering algorithms and found to perform well in terms of homogeneity, silhouette and the z -score cluster validity measure.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

خوشه‌بندی داده‌های بیان‌ژنی توسط عدم تشابه جنگل تصادفی

Background: The clustering of gene expression data plays an important role in the diagnosis and treatment of cancer. These kinds of data are typically involve in a large number of variables (genes), in comparison with number of samples (patients). Many clustering methods have been built based on the dissimilarity among observations that are calculated by a distance function. As increa...

متن کامل

Clustering Gene Expression Data Using an Effective Dissimilarity Measure

This paper presents two clustering methods: the first one uses a density-based approach (DGC) and the second one uses a frequent itemset mining approach (FINN). DGC uses regulation information as well as order preserving ranking for identifying relevant clusters in gene expression data. FINN exploits the frequent itemsets and uses a nearest neighbour approach for clustering gene sets. Both the ...

متن کامل

Modification of the Fast Global K-means Using a Fuzzy Relation with Application in Microarray Data Analysis

Recognizing genes with distinctive expression levels can help in prevention, diagnosis and treatment of the diseases at the genomic level. In this paper, fast Global k-means (fast GKM) is developed for clustering the gene expression datasets. Fast GKM is a significant improvement of the k-means clustering method. It is an incremental clustering method which starts with one cluster. Iteratively ...

متن کامل

A new approach for clustering gene expression time series data

Identifying groups of genes that manifest similar expression patterns is crucial in the analysis of gene expression time series data. Choosing a similarity measure to determine the similarity or distance between profiles is an important task. This paper proposes a suitable dissimilarity measure for gene expression time series data sets. It also presents a graph-based clustering method for findi...

متن کامل

Assessing Dissimilarity Measures for Sample-Based Hierarchical Clustering of RNA Sequencing Data Using Plasmode Datasets

Sample- and gene-based hierarchical cluster analyses have been widely adopted as tools for exploring gene expression data in high-throughput experiments. Gene expression values (read counts) generated by RNA sequencing technology (RNA-seq) are discrete variables with special statistical properties, such as over-dispersion and right-skewness. Additionally, read counts are subject to technology a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010